Branch-prediction driven instruction prefetch

ABSTRACT

The invention provides a method and apparatus for optimizing instruction prefetch and caching in a processor. In the preferred embodiment, a path prediction circuit maintains information about which cache lines are likely to be executed in the future. This information is used to independently fetch the predicted cache lines, store them in a prefetch queue, and load them in to the instruction cache as instructions contained in these lines are about to be decoded by the processor. A plurality of cache lines can be in the process of being simultaneously fetched from main memory to load the prefetch queue.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates to a method and an apparatus forinstruction caching in computer processors.

[0003] 2. Description of Related Art

[0004] Known instruction memory caching schemes for computer processorsuse cache memory to improve processor efficiency. Typically, when aninstruction is fetched by the processor, an instruction cache isaccessed to determine whether a copy of the memory holding theinstruction is in the cache. If so, the instruction is provided to theprocessor from the instruction cache. If not, the main memory isaccessed and a portion of the contents of the main memory that containsthe instruction is copied to the instruction cache. The copiedinformation is a cache line.

[0005] Because the instruction execution path is likely to continuesequentially and because instructions are often repeatedly executed,once the cache line is cached, the processor need not access main memoryso long as the instructions being executed are from cache lines residentin the instruction cache. Thus, caching instructions reduces processordelays that would otherwise result from main memory fetches.

[0006] One problem which has arisen in the art is that instructioncaching does not avoid all instruction memory access delays. One reasonfor this is that when sequential instruction execution reaches the endof a cache line, the subsequent cache line must be fetched frominstruction memory if the subsequent cache line is not already in theinstruction cache. Waiting for the subsequent cache line stalls theprocessor. Another reason for processor stalls is because branchinstructions alter the sequential instruction fetch sequence within theinstruction execution path. Thus, the cache line that contains the nextinstruction that is to be executed after a branch instruction may not beresident in the instruction cache. This requires that the prior artfetch the target instruction from main memory instead of from theinstruction cache.

[0007] Both of these reasons invoke a main memory fetch that results inthe processor incurring delays that are relatively much longer thandelays incurred due to fetches from the instruction cache. The fetch tomain memory thus delays the processing of the instruction execution pathuntil the fetch for the cache line containing the needed instruction iscompleted.

[0008] One skilled in the art will understand that the main memory mayitself be cached (for example a level 2 cache). However the main memorycache is relatively slower than the instruction cache.

[0009] Another problem is that only one cache line is read from memoryinto the instruction cache at a time and during the fetch theinstruction cache can not be accessed to get instructions. Thus, if asubsequently accessed cache line would have required a linefill frommain memory, the processor would incur an additional delay for a secondcache linefill request from main memory, after it fetched all neededinstructions from the first line resident in the instruction cache.

[0010] Accordingly, it would be desirable to provide a caching schemethat predicts and pre-fetches a number of cache lines that are expectedto be needed in the future to overlap this process with other processoractivities and thus minimize the amount of time the instruction cache isunavailable to the processor.

SUMMARY OF THE INVENTION

[0011] The invention provides a method and apparatus for optimizinginstruction prefetch and caching in a processor. In the preferredembodiment, a path prediction circuit maintains information about whichcache lines are likely to be executed in the future. This information isused to independently fetch the predicted cache lines, store them in aprefetch queue, and load them in to the instruction cache asinstructions contained in these lines are about to be decoded by theprocessor.

[0012] The foregoing and many other aspects of the present inventionwill no doubt become obvious to those of ordinary skill in the art afterhaving read the following detailed description of the preferredembodiments that are illustrated in the various drawing figures.

DESCRIPTION OF THE DRAWINGS

[0013]FIG. 1 illustrates an instruction processing architecture inaccordance with a preferred embodiment;

[0014]FIG. 2 illustrates a method for loading the prefetch queue usingthe instruction processing architecture of FIG. 1;

[0015]FIG. 3 illustrates a method for loading the instruction cache fromthe prefetch queue using the instruction processing architecture of FIG.1;

[0016]FIG. 4 illustrates a method for executing instruction from theinstruction cache using the instruction processing architecture of FIG.1.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0017]FIG. 1 illustrates an instruction processing architecture,indicated by general reference character 100 that includes a memorysystem 101, a prefetch queue 103, an instruction cache (data) memory105, an instruction parse/decode logic 107, and an instruction executelogic 109. Cache lines are fetched from the memory system 101 and cachedin the instruction cache (data) memory 105 until instructions fetchedfrom these cache lines are decoded and parsed by the instructionparse/decode logic 107 and executed by the instruction execute logic109. The instruction cache (data) memory 105 is organized to so as tohold one or more instruction cache lines from the memory system 101.

[0018] The instruction execute logic 109 gathers information relating tothe results of execution of control transfer instructions. Thisinformation is passed by an update/correct predictor path 111 to a pathpredictor logic 113 where the path predictor logic 113 adjusts itspredictors dependent on the outcome of the execution. The processing ofinstruction data from the instruction cache (data) memory 105 by theinstruction parse/decode logic 107 and the instruction execute logic 109is relatively much faster than the time required to fetch an instructioncache line from the memory system 101. One of the goals of the inventionis to preload the prefetch queue 103 so that the missing instructioncache line can be readily loaded into the instruction cache (data)memory 105 when it is needed, so that the instruction parse/decode logic107 and the instruction execute logic 109 do not stall waiting forinstruction data if the next cache line accessed by the program flowmisses in the instruction cache.

[0019] The path predictor logic 113 communicates with a prefetch pointerlogic 115 that includes a prefetch pointer and control logic forperforming the prefetch operations described herein. The prefetchpointer logic 115 provides an ‘advance predictor’ signal 116 to the pathpredictor logic 113 when the prefetch pointer logic 115 finishesprocessing of sequentially fetched instructions contained in one cacheline. The path predictor logic 113 responds to the ‘advance predictor’signal 116 by providing a ‘new prefetch pointer’ signal 117 responsiveto the past execution history. The ‘new prefetch pointer’ signal 117includes the predicted address of an upcoming instruction. Upper bits ofthis address represent the address of the cache line that is expected tobe needed. Thus, as the prefetch pointer logic 115 is able to initiate alinefill, the prefetch pointer is advanced along the instructionexecution path.

[0020] One example of the path predictor logic 113 is provided by U.S.patent application Ser. No. 09/429,590 filed Oct. 28, 1999 entitledBLOCK-BASED BRANCH TARGET BUFFER hereby incorporated by reference in itsentirety.

[0021] An instruction execution path is a sequence of addresses ofexecuted instructions. Thus, given an address and an execution history,the path predictor logic 113 can predict the instruction execution pathfor the execution of subsequent instructions. The instruction executionpath can be represented by a sequence of instruction cache lines. Oneskilled in the art will understand that this arrangement of the pathpredictor logic 113 and the prefetch pointer logic 115 allows theprediction of which instruction cache lines are to be executed based onthe addresses of the cache lines being fetched.

[0022] A ‘current prefetch pointer’ signal 119 is provided to aninstruction cache (tag) memory 121. The instruction cache (tag) memory121 determines whether the instruction cache line containing theinstruction at the ‘current prefetch pointer’ signal 119 is already inthe instruction cache (data) memory 105. A LRU memory 123 is used when acache miss occurs to identify which cache way the missing line is to bewritten to. The instruction cache (tag) memory 121 sends a ‘hit/miss,way number’ signal 125 result to the prefetch pointer logic 115 and theprefetch queue 103. The prefetch pointer logic 115 uses the ‘hit/miss,way number’ signal 125 to determine whether to originate a linefillrequest. If the instruction cache line is not currently cached nor inthe process of being fetched, the prefetch pointer logic 115 sends a‘linefill request’ signal 127 to the memory system 101 that willeventually respond with memory data 129. The memory data 129 supplied bythe memory system 101 flows into the prefetch queue 103 where it isaccumulated. After control information about one instruction cache lineis loaded into the prefetch queue 103, another prefetch lookup operationcan be initiated by the prefetch pointer logic 115. Thus, the prefetchpointer logic 115 initiates as many linefill requests as possible to thememory system 101. One skilled in the art will understand that somenumber of instruction cache lines within the predicted instructionexecution path can thus be selected. The upcoming instruction cachelines are those in the instruction execution path that containinstructions that are expected to be executed relatively soon.

[0023] The prefetch queue 103 includes a data portion 135 and a controlportion 137 organized into a plurality of entries. A prefetch queueentry 139 in the prefetch queue 103 is but one of the entries that canbe contained by the prefetch queue 103. The prefetch queue entry 139 (asshown) is located at the head of the prefetch queue 103. An instructioncache line from the memory system 101 is read into the data portion 135for each queued instruction cache line as it arrives from memory. Thecontrol portion 137 for each queued instruction cache line stores thestatus and address of the instruction cache line as received from theinstruction cache (tag) memory 121. The address of the cache line isobtained from the prefetch pointer logic 115 and the status contains the‘hit/miss, way number’ signal 125 from the instruction cache (tag)memory 121 and the LRU memory 123 associated with the instruction cacheline.

[0024] Once the instruction cache line is completely fetched and storedin the prefetch queue 103, it is eventually transferred to theinstruction cache (data) memory 105 (via a ‘cache line data’ signal 133)where it is made available to the instruction parse/decode logic 107 ofthe processor. At the same time, the address from the control portion137 of the prefetch queue 103 is transferred to the instruction cache(tag) memory 121 via a ‘tag fill data’ signal 141.

[0025] One skilled in the art will understand that the prefetch queue103 can have a fixed number of entries, a variable number of entries orotherwise. In a preferred embodiment, the prefetch queue 103 contains afixed number of entries (for example eight entries). Such a one willalso understand that the data portion 135 of each entry can beimplemented as storage that can contain the entire cache line, or be apointer or index to a pool of buffers that can contain the entire cacheline (for example, the buffer pool may contain two buffers). The actualdetails of the implementation of the invention is subject to performanceand cost tradeoffs and encompass many variations not detailed here butunderstood by one skilled in the art.

[0026] Information in the cache line in the prefetch queue 103 issimultaneously written into the instruction cache (tag) memory 121 andthe instruction cache (data) memory 105 when the cache line is at thehead of the prefetch queue 103, the linefill operation has completed,and the previous entry in at the head of the prefetch queue 103 has beenprocessed.

[0027] Once the instruction cache line is loaded into the instructioncache (data) memory 105 each instruction within the cache line isavailable to the instruction parse/decode logic 107 that is configuredto parse and decode the instruction. The parsed instruction is thenexecuted by the instruction execute logic 109. If the executedinstruction causes a change in the instruction execution path away fromthe predicted instruction execution path, the path predictor logic 113is updated using the update/correct predictor path 111. This comprises aprediction modification mechanism that updates execution historyinformation in the path predictor logic 113 so that future predictionsare responsive to the execution history of the instructions within theinstruction cache line.

[0028] Preferred methods for loading and emptying the prefetch queue 103as well as adjusting the path predictor logic 113 are subsequentlydescribed with respect to FIG. 2, FIG. 3 and FIG. 4 respectively.

[0029] One skilled in the art will understand that many path predictionmechanisms can be used when implementing the path predictor logic 113.These include using a single, multiple bit, or correlated predictorstate as is known in the art of branch prediction. In addition, thetechniques relating to fetch-block predictions described by theapplication incorporated by reference can also be used.

[0030]FIG. 2 illustrates a load prefetch queue process 200, used withthe instruction processing architecture 100 of FIG. 1. The load prefetchqueue process 200 initiates at a ‘start’ terminal 201 and continues to a‘predict instruction path’ step 203. The ‘predict instruction path’ step203 uses prediction techniques (as previously discussed) to predictwhich instructions will be executed based on execution history. An‘identify upcoming instruction cache line’ step 205 determines the nextexpected cache line that contains instructions on the execution path.Some embodiments support a variable number of entries in the prefetchqueue 103. Other embodiments use a fixed number of entries. If a fixednumber of entries can be in the prefetch queue 103, a ‘stall forprefetch queue entry’ step 207 stalls the load prefetch queue process200 until an entry in the prefetch queue 103 becomes available. Once anentry becomes available (or if an entry was available), the loadprefetch queue process 200 continues to an ‘add entry to prefetch queue’step 209 that queues an entry at the tail of the prefetch queue 103.Next, a ‘cache line already in cache’ decision step 211 determineswhether the cache line determined by the ‘identify upcoming instructioncache line’ step 205 is already resident in the instruction cache (data)memory 105. If the cache line is already resident in the instructioncache (data) memory 105, the load prefetch queue process 200 continuesto a ‘load control field in prefetch queue’ step 213. The ‘load controlfield in prefetch queue’ step 213 loads the control portion 137 of thenew entry with the HIT and the cache way number such that the prefetchqueue entry identifies the cache line residing in the instruction cache(data) memory 105. The load prefetch queue process 200 continues back tothe ‘predict instruction path’ step 203 to predict the next cache linein the execution sequence.

[0031] However, if the ‘cache line already in cache’ decision step 211determined that the cache line found by the ‘identify upcominginstruction cache line’ step 205 was not already in the instructioncache (data) memory 105, the load prefetch queue process 200 continuesto a ‘stall for cache line buffer’ step 215. Each entry in the prefetchqueue 103 includes the data portion 135. In some implementations, thedata portion 135 can contain sufficient memory to hold a cache line.Other implementations provide a limited pool of cache line buffers. Forimplementations that have a pool of cache line buffers, the ‘stall forcache line buffer’ step 215 stalls the load prefetch queue process 200until one of the cache line buffers is free. One skilled in the art willunderstand that the ‘stall for cache line buffer’ step 215 is not neededif a cache line buffer exists for each entry in the prefetch queue 103.Once a cache line buffer becomes available, a ‘load buffer’ step 217acquires the buffer and starts a memory transfer into the cache linebuffer of cache line from memory. The load prefetch queue process 200continues back to the ‘predict instruction path’ step 203 to predict thenext cache line in the execution sequence.

[0032] In one preferred embodiment the ‘load buffer’ step 217 performsits function by acquiring a cache line buffer and initiating a linefillrequest to the memory system 101 directed toward the acquired cache linebuffer.

[0033] Thus, the load prefetch queue process 200 queues up entries intothe prefetch queue 103.

[0034]FIG. 3 illustrates a ‘load instruction cache process’ 300 thatloads the instruction cache (data) memory 105 from the prefetch queue103 (thus, unloading the prefetch queue 103). The ‘load instructioncache process’ 300 initiates at a ‘start’ terminal 301 and continues toa ‘test head of prefetch queue’ step 303 that examines the controlportion 137 of the head entry (for example, the entry at the positionindicated by the prefetch queue entry 139) of the prefetch queue 103. A‘cache line hit’ decision step 305 determines whether the cache line isalready in the instruction cache (data) memory 105 by checking for a HITin the control portion 137 of the prefetch queue entry 139. If the cacheline is already in the instruction cache (data) memory 105, the ‘loadinstruction cache process’ 300 continues to an ‘advance prefetch queue’step 307 to advance the prefetch queue 103 (thus, moving another entryto the head of the prefetch queue 103 and making an entry available forthe ‘stall for prefetch queue entry’ step 207).

[0035] However, if the ‘cache line hit’ decision step 305 did not detecta HIT, the ‘load instruction cache process’ 300 continues to a ‘bufferfull’ decision step 309. The ‘buffer full’ decision step 309 determineswhether the cache line transfer from memory initiated by the ‘loadbuffer’ step 217 has completed. If the transfer has not completed, the‘load instruction cache process’ 300 waits for the cache line to betransferred. Once the cache line is transferred to the buffer, the ‘loadinstruction cache process’ 300 continues to a ‘copy buffer toinstruction cache’ step 311 that copies the cache line from the bufferto the instruction cache (data) memory 105. Then a ‘release buffer’ step313 releases the buffer (for embodiments that have a restricted numberof buffers) for reuse by the ‘load buffer’ step 217 and the ‘loadinstruction cache process’ 300 continues to the ‘advance prefetch queue’step 307.

[0036] Thus, entries are removed from the prefetch queue 103 and cachelines that are predicted to be needed and not present in the instructioncache data memory are loaded into the instruction cache (data) memory105.

[0037]FIG. 4 illustrates an ‘instruction execution process’ 400 thatinitiates at a ‘start’ terminal 401 and continues to a ‘requestinstruction’ step 403. The ‘request instruction’ step 403 requests aninstruction from the instruction cache (data) memory 105 usingtechniques known in the art. A ‘parse/decode instruction’ step 405parses and decodes the instruction and an ‘execute instruction’ step 407executes the instruction. These steps are also well known in the art. An‘update predictor’ step 409 examines the result of the execution of theinstruction to determine whether the result of the instruction executionhas changed the execution path from what was expected. If theinstruction path changed from what was predicted, the path predictorlogic 113 is modified by the update/correct predictor path 111 and theprefetch pointer logic 115 is modified to reflect the new executionpath. In addition, the prefetch queue 103 is flushed to remove existingentries that were based on the previously predicted execution path.

[0038] One skilled in the art will understand that the load prefetchqueue process 200 does not show the steps used to initialize theinstruction cache (data) memory 105, the instruction cache (tag) memory121, or the path predictor logic 113. However, such a one wouldunderstand how to so initialize.

[0039] One skilled in the art will understand that the invention enablesmultiple linefill requests to be initiated to the memory system 101before the instruction data contained in the requested cache lines isneeded by the instruction parse/decode logic 107. Thus, the latency ofaccessing the memory system 101 will have less impact on performance ofthe processor since such memory access will overlap with the operationof the instruction parse/decode logic 107 working on instructions thatare younger in the program flow. Thus, the invention reduces processingdelays by preloading cache lines from memory based on a predictedexecution path.

[0040] While preferred embodiments are disclosed herein, many variationsare possible which remain within the concept and scope of the invention,and these variations would become clear to one of ordinary skill in theart after perusal of the specification, drawings and claims herein.

What is claimed is:
 1. A method for prefetching one or more instructioncache lines from a memory into an instruction cache, said methodincluding steps of: fetching a set of one or more upcoming instructioncache lines from said memory to a prefetch queue; and loading one ofsaid set from said prefetch queue into said instruction cache after saidone of said set is completely fetched from said memory.
 2. The method ofclaim 1 further including steps of: predicting an instruction executionpath responsive to an address maintained by a prefetch pointer;identifying said one or more upcoming instruction cache lines on saidinstruction execution path; and determining which of said one or moreupcoming instruction cache lines are to be fetched from said memory. 3.The method of claim 1 further including steps of: executing one or moreinstructions from said instruction cache; and updating a path predictorlogic responsive to the step of executing.
 4. The method of claim 1wherein said prefetch queue has a fixed number of entries and saidmethod further includes waiting for one of said fixed number of entriesto become available prior to the step of loading.
 5. The method of claim1 wherein said prefetch queue has fixed number of cache line buffers andsaid method further includes waiting for one of said fixed number cacheline buffers to become available prior to the step of loading.
 6. Themethod of claim 1 wherein said prefetch queue includes a plurality ofentries and the step of fetching further includes storing one of saidone or more upcoming instruction cache lines in a buffer associated withone of said plurality of entries.
 7. The method of claim 6 wherein thestep of loading further includes steps of: determining whether saidbuffer contains said one of said one or more upcoming instruction cachelines; and copying said one of said one or more upcoming instructioncache lines from said buffer into said instruction cache.
 8. The methodof claim 1 wherein the method is performed within by a computer.
 9. Anapparatus for prefetching one or more instruction cache lines from amemory, said apparatus includes: a path predictor logic configured topredict an instruction execution path given an address; a prefetchpointer logic in communication with said memory, the path predictorlogic and an instruction cache tag memory; a prefetch queue, incommunication with the prefetch pointer logic, said memory and saidinstruction cache tag memory, configured to receive an upcominginstruction cache line from said memory and cache control informationfrom said instruction cache tag memory; and an instruction cache datamemory configured to receive said upcoming instruction cache line fromthe prefetch queue.
 10. The apparatus of claim 9 wherein the prefetchpointer logic is configured to communicate said address in a prefetchpointer to the path predictor logic and to receive a predicted pointeridentifying said instruction execution path from the path predictorlogic, the prefetch pointer logic also configured to use saidinstruction cache tag memory to determine whether said upcominginstruction cache line is to be fetched from said memory and if so torequest said upcoming instruction cache line from said memory.
 11. Theapparatus of claim 9 further including: an instruction executionmechanism configured to execute an instruction received from theinstruction cache data memory, the instruction execution mechanism incommunication with the path predictor logic.
 12. The apparatus of claim11 wherein the path predictor logic further includes a predictionmodification mechanism responsive to the instruction execution mechanismand configured to cause said path predictor logic to provide futurepredictions responsive to actually executed instructions.
 13. Theapparatus of claim 9 wherein the prefetch queue includes a plurality ofentries each including a control portion and a data portion.
 14. Theapparatus of claim 13 wherein said data portion can contain saidupcoming instruction cache line.
 15. The apparatus of claim 13 whereinsaid data portion references a buffer that can contain said upcominginstruction cache line.
 16. The apparatus of claim 15 wherein saidbuffer is one of a plurality of buffers available to the prefetch queue.17. The apparatus of claim 13 wherein the prefetch queue includes afixed number of said plurality of entries.
 18. The apparatus of claim 13wherein the prefetch queue includes a variable number of said pluralityof entries.